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ABSTRACT 

The current annotation of the human genome includes more than 12,000 long intergenic noncoding RNAs (lincRNA). While a 
handful of lincRNA have been shown to play important regulatory roles, the functionality of most remains unclear. Here, we 
examined the expression conservation and putative functionality of lincRNA in human and macaque prefrontal cortex (PFC) 
development and maturation. We analyzed transcriptome sequence (RNA-seq) data from 38 human and 40 macaque 
individuals covering the entire postnatal development interval. Using the human data set, we detected the expression of 5835 
lincRNA annotated in GENCODE and further identified 1888 novel lincRNA. Most of these lincRNA show low DNA sequence 
conservation, as well as low expression levels. Remarkably, developmental expression patterns of these lincRNA were as 
conserved between humans and macaques as those of protein-coding genes. Transfection of development-associated lincRNA 
into human SH-SY5Y cells affected gene expression, indicating their regulatory potential. In brain, expression of these putative 
target genes correlated with the expression of the corresponding lincRNA during human and macaque PFC development. 
These results support the potential functionality of lincRNA in primate PFC development. 
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INTRODUCTION 

Long intergenic noncoding RNA (lincRNA) is a cumulative 
RNA category describing transcripts of length >200 nt that 
have a low protein-coding potential. Common lincRNAs 
are transcribed by Pol II polymerase and, as with other Pol 
II transcripts, are capped, polyadenylated, and frequently 
spliced (Guttman et al. 2009). Numbers of lincRNA genes 
in humans and other species continue to increase steadily 
due to both computational (Jia et al. 2010) and experimental 
(Guttman et al. 2009, 2010; Khalil et al. 2009; Cabili et al. 
2011; Ulitsky et al. 2011; Nam and Bartel 2012; Young 
et al. 2012) efforts. Thus, using chromatin-state maps, Gutt- 
man et al. discovered ~1600 lincRNAs in mouse cell lines 
(Guttman et al. 2009), while Khalil et al. identified 3269 
lincRNAs in human cell lines, including human embryonic 
stem cells (Khalil et al. 2009). With the help of high-through- 
put RNA sequencing (RNA-seq), Guttman et al. (2010) and 
Cabili et al. (2011) identified thousands of further lincRNA 
gene candidates in the mouse and human genomes, respec- 



4 These authors contributed equally to this work. 
Corresponding author 
E-mail khaitovich@eva.mpg.de 

Article published online ahead of print. Article and publication date are at 
http://www.rnajournal.org/cgi/doi/10.1261/rna.043075.113. Freely available 
online through the RNA Open Access option. 



tively. Finally, detailed characterization of the human tran- 
scriptome and epigenome by the ENCODE project resulted 
in the annotation of a total of 13,248 lincRNA genes (Derrien 
et al. 2012). 

Expression of long noncoding RNAs is not restricted to 
mammals. By integrating chromatin-state maps and RNA- 
seq data, Ulitsky et al. identified more than 500 lincRNA 
genes in zebrafish (Ulitsky et al. 2011). Further, Young et 
al. identified 1119 putative lincRNA genes in the fruit fly 
(Drosophila melanogaster) genome using modENCODE tran- 
scriptome data (Young et al. 2012), while Nam et al. found 
170 lincRNA genes in the nematode worm (Caenorhabditis 
elegans) genome (Nam and Bartel 2012). 

Across species, lincRNAs tend to share the same character- 
istic features: low DNA sequence conservation, relatively low 
expression levels, and high spatial and temporal expression 
specificity (Cabili et al. 201 1; Ulitsky et al. 201 1). Given these 
properties, the functional relevance of lincRNA expression 
has been questioned (Ponjavic et al. 2007). Nonetheless, 
the functional importance has been demonstrated for a num- 
ber of lincRNAs, including XIST, HOTTIP, and HOTAIR 
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(Brown et al. 1992; Rinn et al. 2007; Wang et al. 201 1). These 
lincRNAs were shown to play roles in transcriptional regula- 
tion through changes in chromatin state. Furthermore, by 
examining the effects of lincRNA knock-downs in mouse 
embryonic stem cells, Guttman et al. demonstrated that 
17.7% of the tested 147 lincRNA may serve as specific trans- 
regulators of protein-coding genes during cellular differenti- 
ation (Guttman et al. 2011). This, together with examples 
of regulatory roles for lincRNA CRNDE (Ellis et al. 2012) 
and linc-MDl (Cesana et al. 2011) in brain and muscle de- 
velopment, indicates that at least some lincRNAs may func- 
tion as regulators during cell and tissue development and 
differentiation. 

Previous studies have shown that multiple lincRNAs are 
expressed in the mouse (Mercer et al. 2008, 2010) and human 
brains (Ng et al. 2012). Here, we investigated patterns, con- 
servation, and potential functional roles of lincRNA expres- 
sion during human and macaque brain development and 
maturation in one specific brain region: prefrontal cortex 
(PFC). 

RESULTS 

LincRNA identification and characterization 

We analyzed lincRNA expression in the human brain using 
RNA-seq time series data from the PFC region of 38 healthy 
human individuals with ages from 2 d to 61 yr, as well as 40 
healthy macaque individuals with ages from 70 d before birth 
to 21 yr of age (Supplemental Table SI). The data set contains 
a total of 378 million human single- ended reads and 446 mil- 
lion macaque single-ended reads of 100 nt in length (Supple- 
mental Table SI). 

To identify novel lincRNA expressed 
in human PFC development, we first 
mapped the reads to the human genome 
(hgl9) using TopHat (v.2.0.6) (Trapnell 
et al. 2009), allowing up to two mis- 
matches. This resulted in 337 million 
(89.1%) uniquely mapped reads (Sup- 
plemental Table SI). Using the lincRNA 
identification procedure described in 
Cabili et al. (2011), we found 1888 novel 
putative human lincRNAs expressed in 
the PFC (Fig. 1A) that did not overlap 
with the GENCODE (v. 16) annotation. 
Of them, 111 were also annotated by 
Cabili et al. (2011), while the others 
have not been previously reported. 

To quantify the expression of novel 
and annotated lincRNAs in human and 
macaque samples in an unbiased manner, 
we mapped human and macaque RNA- 
seq reads to a consensus genome, con- 
structed based on a pairwise whole-ge- 



nome alignment of the human (hgl9) and the rhesus 
macaque (rheMac3) genomes. Using the STAR mapping soft- 
ware and default parameters, 268 million (70.67%) human 
reads and 311 million (69.77%) macaque reads could be 
mapped uniquely to the consensus genome. Expression levels 
of annotated and novel lincRNAs were quantified as the num- 
ber of reads per kilobase per million mapped reads (RPKM). 
Low expressed lincRNAs (with a maximum expression below 
1 RPKM or detected in less than half of the samples within 
one species) were removed from further analyses. In humans, 
1061 lincRNAs, including 167 novel ones, were expressed 
above this detection threshold. 

In agreement with previous studies (Guttman et al. 2010), 
expression levels of lincRNAs were on average lower than ex- 
pression levels of protein-coding genes. Furthermore, the ex- 
pression levels of novel lincRNAs were lower than expression 
levels of annotated lincRNA (Fig. IB). Novel lincRNAs were 
also less conserved at the DNA sequence level than annotated 
lincRNAs (one-sided Wilcoxon test, P< 0.0001) (Fig. 1C). 
Notably, both 894 annotated and 167 novel lincRNAs ex- 
pressed in PFC were significantly more conserved than the 
annotated or novel lincRNAs with low or no detectable ex- 
pression in the PFC (one-sided Wilcoxon test, P< 0.0001) 
(Fig. 1C). Similar results were obtained by mapping human 
RNA-seq data to the reference human genome (hgl9) instead 
of the human-macaque consensus genome (Supplemental 
Fig. 1). This finding is noteworthy as it parallels previous ob- 
servations of higher sequence conservation of protein-coding 
genes expressed in brain (Jordan et al. 2005; Tuller et al. 
2008), which are further confirmed using this RNA-seq 
data set (Supplemental Fig. 2). Overall sequence conservation 
of lincRNAs measured as the average phastCon scores of 
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FIGURE 1. Expression and conservation of novel and annotated human lincRNAs. (A) 
Distributions of the average expression levels of novel (blue) and annotated (red) lincRNAs, as 
well as annotated protein- coding genes (gray), in human PFC samples measured in RPKM (reads 
per kilobase of exon model per million mapped reads). (B) Sequence conservation of lincRNA 
genes. Shown are median phastCon scores of novel lincRNAs expressed (blue) and not expressed 
(green) in the human PFC, annotated lincRNAs expressed (red) and not expressed (orange) in the 
human PFC, as well as median phastCon scores of simple repeats (gray) used as a nonconserved 
sequence reference. PhastCon scores were calculated based on 1 1 primate genomes. "Expressed 
lincRNAs" are defined as lincRNA detected in at least half of the samples with maximal expres- 
sion > 1 RPKM. The box plots show variation of the median conservation estimates calculated by 
bootstrapping over phastCon score values 1000 times. 
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transcribed regions was still substantially lower than the se- 
quence conservation of protein- coding gene exons (median 
= 0.139, one-sided KS-test, P < 0.0001). 

In agreement with lower DNA sequence conservation of 
the lincRNA genes, a smaller proportion of lincRNA was 
detected to be expressed in both species compared to the 
protein-coding genes. Using the consensus genome, 13,722 
(93%) of 14,745 protein-coding genes expressed in human 
PFC at RPKM > 1 passed this detection threshold in macaque 
PFC. In contrast, only 514 (48%) of 1061 lincRNAs expressed 
in human PFC at RPKM > 1 passed this detection threshold 
in macaque PFC. Mapping human and macaque RNA-seq 
data to the respective genomes, resulted in similar observa- 
tions: 10,052 (93%) of 10,820 protein- coding genes and 
391 (51%) of 770 lincRNAs expressed in human PFC at 
RPKM > 1 passed this detection threshold in macaque PFC. 
These results support previous observations of lower se- 
quence conservation and greater turnover rate for lincRNA 
genes (Kutter et al. 2012). 

LincRNA expression changes with age 

Among 1061 lincRNAs expressed in the human PFC time se- 
ries, 409 (38.5%) showed significant expression level changes 
with age (age-test) (Somel et al. 2009) — P < 0.05 after Benja- 
mini-Hochberg correction, permutation-based FDR < 5%. 
Similarly, 45.9% of protein-coding genes expressed in PFC 
showed significant expression level changes with age. The 
lower proportion of age-related lincRNAs was partly due 
to their lower average expression: after subsampling the pro- 
tein-coding genes based on the distribution of lincRNA 
expression levels, the proportion of age-related protein-cod- 
ing genes (42.6 ± 1.4%) was closer to that of lincRNA (Sup- 
plemental Fig. 3). 

To test the reproducibility of observed age-related changes 
in lincRNA expression, we used a published RNA-seq time 
series data set containing measurements from the PFC of 
14 healthy human individuals (Mazin et al. 2012). Of 409 
lincRNAs showing age-related expression profiles in the orig- 
inal time series data, 292 were expressed in both data sets. 
Developmental expression trajectories of these lincRNA 
were consistent between the two data sets, as shown by the 
Pearson correlation coefficient distribution (Supplemental 
Fig. 4). Specifically, 209 of the 292 lincRNAs (71.6%) showed 
significant positive correlation between the data sets (one- 
sided Pearson correlation test, P < 0.05). Importantly, repro- 
ducibility of lincRNA expression profiles was comparable to 
that of the age-related protein- coding genes sampled from 
the same expression level distribution: 75.8 ± 2.3% of them 
showed significant positive correlation between the data 
sets (Supplemental Fig. 4). 

To obtain an overview of expression patterns formed by 
the 409 age-related lincRNAs and the 6771 age-related pro- 
tein-coding genes, we grouped them into clusters according 
to eight main expression patterns (Fig. 2A). When compared 



to protein-coding genes, age-related expressed lincRNAs 
were enriched in two patterns, Group 3 and Group 8, both 
showing decreases in gene expression levels in late develop- 
ment (x 2 test, P < 0.05 after multiple testing correction) 
(Supplemental Fig. 5). Protein-coding genes in these two 
groups were significantly enriched in Gene Ontology (GO) 
terms that included "plasma membrane," "neurological sys- 
tem process," "synaptic transmission," "calcium ion binding," 
"transmission of nerve impulse," and "cell-cell signaling" 
(Supplemental Table S2). 

We next tested whether the human age-related lincRNA 
expression patterns are conserved in macaques. Based on ma- 
caque RNA-seq reads mapped to the consensus genome, ex- 
pression of 478 annotated and 36 novel human lincRNAs 
could be detected above the expression level cutoff of one 
RPKM, in both human and macaque samples. Of these, 
196 and 1 1 showed significant age-related expression changes 
in the human time series, respectively. Despite the relatively 
small overlap of lincRNA genes expressed in human and ma- 
caque PFC development that could be observed above the 
detection threshold, the developmental expression profiles 
of these 196 annotated and 11 novel lincRNAs correlated 
positively and significantly between the human and macaque 
time series (median r = 0.57 and r = 0.48 for annotated and 
novel lincRNAs, respectively, permutation g < 0.001). Im- 
portantly, this level of developmental profile conservation 
was comparable to the one observed for protein-coding genes 
sampled from the same expression level distribution (median 
r— 0.60 ± 0.05) (Fig. 2B). This result was not caused by the 
use of the consensus genome, as mapping RNA-seq data to 
the human and rhesus macaque genomes separately yielded 
consistent observations (Supplemental Fig. 6). Thus, despite 
the lower DNA sequence conservation and higher turnover 
rate of lincRNAs, developmental expression trajectories of 
lincRNAs and protein-coding genes expressed in both species 
showed comparable conservation levels. 

For protein-coding genes, expression level conservation is 
known to correlate with the DNA sequence conservation of 
the transcribed region, as well as the core promoter elements 
(Khaitovich et al. 2005; Donaldson and Gottgens 2006). To 
test whether this holds true for lincRNAs, we compared the 
conservation of lincRNA developmental expression profiles 
with the DNA sequence conservation of lincRNA transcribed 
regions, as well as the core promoter regions situated 200 bp 
upstream of and 50 bp downstream from the transcription 
start site. Using average phastCon scores as a proxy for 
DNA sequence conservation, we, indeed, found a significant 
correlation between the conservation of lincRNAs' develop- 
mental expression profiles and the sequence of the tran- 
scribed regions (Spearman correlation, p = 0.177, P = 0.01) 
and core promoter regions (Spearman correlation, p = 
0.150, P = 0.03). The significance of these effects was addi- 
tionally validated by 1000 permutations of gene labels (P< 
0.02) (Supplemental Fig. 7). Notably, the strength of the 
correlation between the conservation of developmental 



www.rnajournal.org 1 1 05 



He ef al. 




0 1 10 50 0 1 10 50 0 1 10 50 0 1 10 50 0 1 10 50 

Age (years) Age (years) Age (years) Age (years) Age (years) 



Group 1 (793) Group 2 (837) Group 3 (1107) Group 4 (550) Group 5 (1817) 




0 1 10 50 0 1 10 50 0 1 10 50 

Age (years) Age (years) Age (years) 



FIGURE 2. Developmental expression patterns of lincRNAs and protein-coding genes in human and macaque PFC. (A) Expression patterns of 
lincRNA (red: human, light red: macaque) and protein-coding genes (blue: human, light blue: macaque) defined as age-related in human PFC 
development. Expression values were z-transformed before plotting. Numbers in brackets above each panel indicate numbers of lincRNAs or 
protein-coding genes in the cluster. On all plots, macaque ages are linearly transformed to the human age scale. (B) Distribution of Pearson cor- 
relation coefficients (PCC) based on the comparison of developmental expression profiles between humans and macaques for annotated lincRNAs 
(orange), novel lincRNAs (green), and protein-coding genes (gray). Confidence intervals of the PCC distribution for protein-coding genes are 
shown by the dashed gray lines and were calculated by subsampling protein-coding genes according to the expression distribution of lincRNA genes 
1000 times. 



expression profiles and the DNA sequence observed for 
lincRNAs was comparable to that observed in our data for 
protein-coding genes: p = 0.163 for transcribed regions and 
p = 0.162 for core promoter regions (P< 0.001 calculated 
in 1000 permutations of gene labels) (Supplemental Fig. 7). 

LincRNA transfection 

To obtain deeper insights into the functions of lincRNAs in 
PFC development, we overexpressed three lincRNA tran- 
scripts showing age-related expression in the human PFC 
in a human neuroblastoma cell line (SH-SY5Y) (Supplemen- 
tal Table S3). We tested the effect of lincRNA overexpression 
by examining the transcriptomes of cells transfected with 



lincRNA constructs or with an empty vector (mock transfec- 
tion). All transfection experiments were carried out in tripli- 
cate. The cell line transcriptomes were examined 24 h after 
transfection using RNA-seq. The expression levels of the 
three transfected lincRNAs were within the expression range 
of endogenous transcripts (Fig. 3A). 

The overexpression of each of the three lincRNA tran- 
scripts showed significant effects on gene expression in the 
cell line, affecting the expression levels of 42, 56, and 103 
genes — 19, 30, and 87 of them protein-coding ([edgeR] 
[Robinson et al. 2010], FDR < 10% after Benjamini-Hoch- 
berg correction) (Fig. 3B). To assess whether overexpression 
itself results in the observed changes in cell transcript ex- 
pression, we compared genes affected by overexpression of 
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FIGURE 3. The effect of lincRNA overexpression on putative target genes in SY5Y cells. (A) Expression level distributions of protein-coding genes 
(green) and lincRNAs (blue) in the SH-SY5Y human neuroblastoma cell line. The red arrows mark the average expression levels of the three trans- 
fected lincRNAs after overexpression. (B) Number of genes showing differential expression (DE) between mock and lincRNA transfection in the SH- 
SY5Y cell line plotted for each of the three development-associated lincRNAs. Genes affected by more than one lincRNA transfection are shown in 
light blue, with independently affected genes in dark blue. The shaded bars show the number of independent protein-coding genes. The solid gray line 
shows the number of differentially expressed genes expected by chance. The two gray dotted lines show the 90% confidence interval for the chance 
distribution. (C) Regulatory effects of HncR-0003 in brain. The red box plots show the medians of the absolute values of the Pearson correlation co- 
efficients based on the developmental expression profile of a lincRNA and its independent protein-coding target genes, identified in SY5Y cells. The 
variance of the median estimates was calculated by bootstrapping target genes 1000 times. The gray boxes show the distribution of the median values of 
the absolute Pearson correlation coefficients based on the developmental expression profiles of lincRNAs and the same number of nontarget protein- 
coding genes sampled 1000 times. (D) Expression profile of lincR-0003 in human (blue) and macaque (green) PFC development. Macaque ages are 
linearly transformed to the human scale. Expression values were z-transformed. Shaded areas indicate standard deviation of expression level estimates. 
(£) Conservation of lincR-0003 target profiles between human and macaque PFC development. The red box plots show the medians of the Pearson 
correlation coefficients based on developmental expression profiles of independent UncR-0003 protein-coding target genes in human and macaque 
PFC. The variance estimate and the background distribution (gray box plot) are as in panel C. 



the three lincRNAs. We, indeed, find a greater than expected 
overlap between affected genes: 18 of 42 transcripts affected 
by lincR-0001, 23 of 56 transcripts affected by lincR-0002, 
and 19 of 103 transcripts affected by lincR-0003 overlapped 
across at least two transfection experiments. These genes 
were excluded from further analyses. 

To test the functional relevance of putative lincRNA tar- 
gets detected in the cell line, we examined lincRNA/target 
coexpression in the human PFC time series. Putative targets 
of lincR-0001 and lincR-0002 did not show significant coex- 
pression with their presumed regulatory lincRNA in human 
PFC (Supplemental Fig. 8). In contrast, there was stronger 
than expected coexpression of lincR-0003 and its 67 indepen- 
dent protein-coding putative target genes expressed in the 



human PFC time series (one-sided Wilcoxon test, P = 
0.014) (Fig. 3C). This effect was not caused by the higher 
expression of putative lincR-0003 targets, as selecting back- 
ground genes with the same expression level distribution 
did not abolish the result (Supplemental Fig. 9). 

We next asked whether the regulatory relationship be- 
tween lincR-0003 and its protein-coding putative target genes 
was preserved in macaque PFC development. Indeed, the ex- 
pression profile of lincR-0003 was more conserved between 
the human and macaque PFC time series than the expression 
profiles of other genes expressed in PFC (r = 0.68) (Fig. 3D). 
Notably, lincR-0003 target genes, predicted based on the 
cell line experiment, also showed more conserved develop- 
mental expression trajectories in human and macaque PFC 
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development than other genes expressed in both species (me- 
dian r = 0.69, one-sided Wilcoxon test, P = 0.0002) (Fig. 3E). 
This observation was not caused by differences in expression 
levels between lincRNA targets and other genes expressed in 
PFC, as selecting background genes with the same expression 
level distribution did not abolish the result (Supplemental 
Fig. 9). Thus, the conservation of lincR-0003 expression be- 
tween human and macaque PFC may have contributed to 
the conservation of the expression trajectories of its putative 
target genes. 

DISCUSSION 

In this study, we quantified the expression of 5835 lincRNAs 
annotated in GENCODE in a human PFC developmental 
time series and further identified 1888 novel lincRNA tran- 
scripts that are not listed in the GENCODE annotation. 
Most of the lincRNAs, both annotated and novel, show low 
DNA sequence conservation, as well as low expression 
levels across the developmental time series. Still, lincRNAs 
showing substantial expression in PFC were significantly 
more conserved than their nonexpressed counterparts. This 
is interesting, as protein- coding genes expressed in brain 
and, specifically in PFC, were shown to be more conserved 
at the DNA sequence level and have greater conservation of 
their promoter region sequences (Jordan et al. 2005; Tuller 
et al. 2008). Our results indicate that the sequences of 
lincRNAs expressed in PFC are also more constrained than 
the sequences of other human lincRNAs. 

More remarkably, while overall sequence conservation and 
ortholog numbers of lincRNAs expressed in human and ma- 
caque PFC development remain low compared to that of pro- 
tein-coding genes, for lincRNAs expressed in both species, 
the conservation of developmental expression profiles be- 
tween humans and macaques was comparable to that of pro- 
tein-coding genes. This finding indicates that despite the low 
conservation at the DNA sequence level and high turnover 
rate of expressed lincRNA genes (Kutter et al. 2012), some 
lincRNAs might play conserved functional roles in primate 
PFC development. Alternatively, this result may show that 
PFC developmental expression patterns are guided by the 
same conserved regulatory mechanisms in humans and 
macaques, resulting in shared lincRNA expression as an evo- 
lutionary neutral by-product. The higher conservation of 
lincRNAs expressed in PFC may, in this case, reflect purifying 
selection removing lincRNA variants that are sufficiently 
deleterious. 

Our cell line results appear, however, to argue against full 
evolutionary neutrality of lincRNA expression in the human 
and macaque PFC. The strongest argument toward the rele- 
vance of expression effects of lincR-0003 identified in the cell 
line comes from the conserved and correlated expression of 
lincR-0003 and its putative target genes in human and ma- 
caque PFC development. It has to be noted, however, that, 
in the absence of further functional studies, this result pro- 



vides no evidence of physiological and functional significance 
of these regulatory effects in either cell line or in the primate 
brain. Still, these findings suggest that at least some of the 
lincRNAs showing conserved expression profiles between 
species may serve as (runs-regulators of specific gene groups 
during PFC development. This notion coincides with previ- 
ous results showing extensive lincRNA- driven trans-regula- 
tion in mouse embryonic stem cells (Guttman et al. 2011). 
More generally, our study illustrates the general usefulness 
of an evolutionary approach to the investigation of 
lincRNA functionality. Further studies of lincRNA expres- 
sion across species and tissues combined with painstaking 
functional studies of individual lincRNAs will be needed to 
assess the functionality of these transcripts. 

MATERIALS AND METHODS 
Ethics statement 

Informed consent for the use of human tissues for research was ob- 
tained in writing from all donors or their next of kin. All nonhuman 
primates used in this study suffered sudden deaths for reasons other 
than their participation in this study and without any relation to the 
tissue used. The Biomedical Research Ethics Committee of the 
Shanghai Institutes for Biological Sciences completed the review 
of the use and care of the animals in the research project (approval 
ID: ER-SIBS-260802P). 

RNA-seq data sets 

Sample material was dissected post-mortem from the prefrontal 
cortex of 38 human and 40 rhesus macaque individuals, with ages 
from 2 d to 61 yr (humans) and 70 d before birth to 21 yr old (ma- 
caques) (Supplemental Table SI). The poly(A)+ RNA fraction was 
sequenced on the Illumina HiSeq 2000 platform, using a standard 
cDNA library preparation protocol. Approximately 378 million hu- 
man and 446 million macaque single-end reads, each 100 nt in 
length, were obtained in total. All data are deposited in the GEO da- 
tabase under accession number GSE5 1264. This data set was used in 
transcriptome reconstruction and gene expression level estimation 
analyses. To validate developmental expression trajectories in hu- 
mans, we further used published RNA-seq time series data from 
the PFC region of 14 healthy humans with ages from 2 d to 98 yr 
(Mazin et al. 2012). 

Human-macaque consensus reference genome 
construction 

Chained and netted alignment files for human (hgl9) and macaque 
(rheMac3), aligned using BLASTZ (Schwartz et al. 2003), were 
downloaded from the UCSC Genome Browser. On the basis of these 
alignments, we constructed a human-macaque consensus genome. 
Specifically, we replaced the human genomic sequences with "N"s 
for all discordant genomic sites, as well as replacing insertions and de- 
letions with "N"s, in the consensus genome. Furthermore, the 6-bp 
regions flanking each insertion or deletion site were masked by "N"s 
in the resulting human-macaque consensus reference genome. 
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Read mapping 

To identify novel human lincRNA genes, sequence reads from the 
prefrontal cortex time series, containing 38 human PFC samples, 
were mapped to the hgl9 reference genome using TopHat v.2.0.6 
(Trapnell et al. 2009), allowing up to two mismatches. No anno- 
tations for genes or splice junctions were used during TopHat 
mapping. The same procedure was applied to the published 
strand-specific RNA-seq time series containing 14 human PFC 
samples (Mazin et al. 2012). Here, we used STAR v.2.3.0 
(Dobin et al. 2013) with default parameters, allowing up to nine 
mismatches, to map sequence reads from the 38 human and 40 
macaque PFC samples to the human-macaque consensus reference 
genome. To test whether mapping to the consensus reference 
genome produces any biases or artifacts, we further mapped 40 
macaque PFC samples to the rheMac3 reference genome using 
TopHat with the above listed parameters. For every mapping 
procedure, only uniquely mapped reads were used in further 
analyses. 



De novo discovery of novel lincRNA candidates 

The pipeline for the de novo identification of lincRNA candidates 
was based on the procedure developed by Cabili et al. (2011). All 
reads uniquely mapped to the human genome were combined using 
SAMtools and assembled to transcripts using Cufflinks v.2.1.1. 
(Trapnell et al. 2010). In order to maximize the sensitivity of tran- 
script detection, the number of required reads for transcript as- 
sembly was lowered to one. In doing so, every mapped read was 
considered in the assembly procedure. The assembled transcripts 
were further filtered using the GENCODE (v. 16) annotation: all 
annotated transcripts and transcripts with exonic overlap to the 
GENCODE annotation were excluded. We further excluded tran- 
scripts that were shorter than 200 bp, as well as transcripts with 
only one exon. 

Protein-coding potential of the obtained novel transcripts was es- 
timated by PhyloCSF (Lin et al. 2011). Transcripts with an open 
reading frame (ORF) scoring higher than 100 in one of all six pos- 
sible reading frames were considered as potentially protein-coding 
and excluded from the list of putative novel human lincRNA. 
Further, potential lincRNA genes were translated in all possible 
reading frames to protein sequences with transseq on the Galaxy 
web server (Giardine et al. 2005; Blankenberg et al. 2010; Goecks 
et al. 2010). The translated sequences were used to search for motifs 
associated with protein-coding genes, using the Pfam database 
(Punta et al. 2012). All transcripts with a significant Pfam-A domain 
in any of the reading frames were excluded from the list of putative 
novel human lincRNAs. 



LincRNA annotation and gene expression 
level estimation 

GENCODE v.16 (Dobin et al. 2013) annotation was used to define 
protein-coding and annotated lincRNA genes. Novel lincRNA gene 
candidates without any overlap to the GENCODE annotation were 
annotated as described above. 

In order to estimate the expression level of each gene in each sam- 
ple, we counted the number of reads that had at least 1 nt overlap 



with at least one exon of this gene. The expression level of gene i 
in sample can be represented as RPKM: 



Here, n - hJ is the number of reads overlapping with the exons of 
gene I, l t is the length of gene /, and Nj is the total number of unique- 
ly mapped reads in sample j. Only genes with maximum RPKM > 1 
and reads detected in more than half of the samples were classified as 
expressed and used in the following analysis. 

Mapping of human transcript annotation 
to the macaque genome using liftOver 

For each gene in the human annotation (GENCODE v.16 includ- 
ing novel lincRNAs identified in this study), we first merged over- 
lapping exons within genes. We then mapped resulting human 
exons to the macaque reference genome (rheMac3) using reciprocal 
liftOver (Hinrichs et al. 2006). The ortholog gene regions within 
the macaque genome were required to contain more than half 
of the merged exons located on the same chromosome and in the 
correct order. 

LincRNAs with age-related expression 
and their expression categories 

To test the effect of age on the expression level of each gene, the poly- 
nomial regression model developed by Somel et al. (2009) was ap- 
plied. Briefly, for each gene the best polynomial regression model, 
with age as a predictor and RPKM as response, was selected in a hi- 
erarchical manner from a linear model up to the third-degree poly- 
nomial. In order to produce a more uniform sample age distribution, 
the original sample ages were transformed using the square root. The 
significance of the regression model was estimated by F-test. Genes 
with age-test P-values smaller than 0.05 after Benjamini-Hochberg 
correction were considered as age-related. The FDR of the age-test 
was estimated by 1000-time random permutation of ages. The medi- 
an of the permutation distribution was used as a null expectation. 

Age-related lincRNAs and protein-coding genes were grouped 
into eight groups, according to all basic expression patterns with 
expression trends separated into early and late developmental inter- 
vals. Specifically, the patterns were designed to discriminate between 
increasing, decreasing, and stable expression during early and late 
developmental stages. Stable expression over the whole lifespan 
was not considered, as it is not age-related. The expression trajectory 
of each gene was correlated to each of the eight patterns using 
Spearman's rank correlation. Each gene was assigned to the best- 
correlated pattern. 

Expression profile comparison between 
species and data sets 

In order to compare expression profiles between humans and ma- 
caques for a given gene with age-related expression in the human 
PFC, we applied cubic spline interpolation, as implemented in the 
smooth.spline function in R, to both the human and the macaque 
time series data to obtain the regressive trajectory. A generalized 
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cross-validation procedure was also implemented in the smooth, 
spline R function to choose suitable smoothing parameters to avoid 
overfitting (Green and Silverman 1994). The age points of macaque 
PFC data were transformed to the human scale by multiplying the 
age by three, as the approximate human lifespan is about three times 
longer than the lifespan of a macaque. The regression curves were 
compared between species by calculating the Pearson correlation 
coefficient based on values interpolated from the fitted spline curves 
at 40 uniformly distributed time points. The same procedure was 
used to estimate the reproducibility between two human data sets. 

Taking the expression differences between lincRNAs and protein- 
coding genes into account, we subsampled protein-coding genes ac- 
cording to the log 10 -transformed RPKM distribution of age-related 
human lincRNA genes. Therefore, the interval between minimum 
and maximum log 10 -transformed RPKM values of lincRNA was 
split into 20 equal bins, and the number of lincRNA in each bin 
was counted. Protein-coding genes were picked according to their 
RPKM values, to construct a gene subset resembling the expression 
distribution of the lincRNA data set. This subsampling was applied 
1000 times in order to estimate variance and confidence interval. 



LincRNA transfection experiments 

The lincRNAs were overexpressed as previously described (Tsai et al. 
2010; Yang et al. 2011). Briefly, three lincRNA genes showing age- 
related expression patterns in human PFC were randomly chosen 
among all 409 age-related human lincRNAs. The three full-length 
lincRNA sequences were separately cloned into the pRNAT- 
CMV3.2 Puro vector under control of a CMV promoter. SH- 
SY5Y neuroblastoma cells were cultured in a 1:1 mixture of 
Ham's F12 medium and Dulbecco's modified Eagle's medium 
(DMEM) supplemented with 10% fetal bovine serum (Hyclone) 
at 37°C in a 5% C0 2 atmosphere. Cells were transiently transfected 
with 1 ug plasmids using Lipofectamine 2000 reagent (Invitrogen) 
according to the manufacturer's protocol at ~80% confluency in 
six-well plates. After 4 h incubation, the transfection mixture was re- 
placed with 2 ml complete medium. 

Transfection was carried out in triplicate for each lincRNA, as well 
as for a mock transfection (empty vector) . Cells were harvested, and 
total RNA was extracted with Trizol reagent (Invitrogen) 24 h after 
transfection. In addition, cells were harvested, and RNA was isolated 
at the 0 h transfection time point (immediately before transfection). 
Isolated total RNA from each of the 15 samples (three lincRNA 
transfection triplicates and one mock transfection triplicate collect- 
ed 24 h after transfection, as well as one triplicate of cells collected at 
0 h) was poly( A) -enriched and converted into an Illumina sequenc- 
ing library using the TruSeq RNA Sample Prep Kit. Libraries were 
quantified with the Qubit dsDNA BR Assay Kit (Invitrogen) and 
qualified using an Agilent Technologies 2100 Bioanalyzer (Agilent). 
Pooled barcoded libraries were sequenced on the Illumina HiSeq 
2000 platform using the 100-bp single-read module. 

After mapping and gene expression estimation (in read counts) 
as described above, edgeR (Robinson et al. 2010), a Bioconductor 
package for differential expression analysis, was applied to identify 
genes differentially expressed between the mock transfection and 
each of the lincRNA transfections (FDR< 10%, Benjamini-Hoch- 
berg correction). 

To estimate the FDR of potential lincRNA targets in the transfec- 
tion experiments, we randomly selected two groups each with three 



samples from the 12 transfected samples and including the mock 
transfection. Assuming no significant difference between these two 
random groups, edgeR was applied with the same P-value cutoff to 
obtain the number of differentially expressed genes. This process 
was repeated 1000 times to estimate the FDR for each transfection. 

DATA DEPOSITION 

All RNA-seq data are deposited in the Gene Expression Omnibus 
(GEO) database under accession number GSE51264. 



SUPPLEMENTAL MATERIAL 

Supplemental material is available for this article. 
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